Topically aware word suggestions

ABSTRACT

Concepts and technologies are described herein for providing topically aware word suggestions. Using a text input, the system determines a conditional count and an unconditional count. The system then determines an adjustment factor for a pair of words of the plurality of words based on the unconditional count and the conditional count. The system then generates a data structure defining a plurality of word clusters. The system then reconstructs the adjustment factor of the pair of words based on a number of common clusters between individual words of the pair of words. The adjustment factor is combined with other data, such as data from a language model dictionary and a freshness factor from an average cluster activation state table to determine a probability associated with a word candidate, which is displayed to a user.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 62/126,307 filed on Feb. 27, 2015, entitled“TOPICALLY AWARE WORD SUGGESTIONS,” the entirety of which is expresslyincorporated herein by reference.

BACKGROUND

As users enter text on a computing device, such as a phone, sometechnologies provide suggestions on a word they may be trying to type ora word that may come next in the sentence. To generate word suggestions,there are a number of technologies that are designed to identifyrelevant words. For instance, some models analyze common sequences ofwords in a data set, and when a specific word of a sequence is enteredin a device, a word that typically follows the specific word issuggested to the user. In one example, if a user enters the word“heart,” most systems using this sequence-based technology would suggestthe word “attack” since samples sets may indicate that sequence ofwords.

Other technologies may use user personalization data to generate wordsuggestions. For example, a device may store text data from a user'sinput. The device may then analyze words or sequences of words that arefrequently used by a particular user to suggest words to a user.

Although existing technologies provide word suggestions, there is roomfor improvement. For example, existing technologies are unaware of thecontext of the user's input and/or other text related to the input. Theanalysis of word sequences simply cannot interpret a broader meaning toprovide a contextually relevant suggestion.

It is with respect to these and other considerations that the disclosuremade herein is presented.

SUMMARY

Concepts and technologies are described herein for providing topicallyaware word suggestions. In one aspect, a system is configured to receivean input containing a plurality of words. Using the input, the systemdetermines a conditional count and an unconditional count. The systemthen determines an adjustment factor for a pair of words of theplurality of words based on the unconditional count and the conditionalcount. The system then generates a data structure defining a pluralityof word clusters, where the individual word clusters of the plurality ofword clusters include at least one word of the plurality of words. Thesystem then reconstructs the adjustment factor of the pair of wordsbased on a number of common clusters between individual words of thepair of words. The adjustment factor is combined with other data, suchas data from a language model dictionary and a freshness factor from anaverage cluster activation state table to determine a probabilityassociated with a word candidate. One or more word candidates aredisplayed to a user based on the probability.

The techniques described herein utilize data from a number of sources toprovide automatic inclusion of contextual awareness to a text input,which allows implementations to dynamically identify topics and provideword suggestions based on the topics. According to various embodiments,data structures may store usage data particular to an application and aperson. The data is used from both data structures to find a grouping oftopically relevant words. Based on one or more calculated probabilitiesa word candidate is selected from the grouping of topically relevantwords.

It should be appreciated that the above-described subject matter mayalso be implemented as a computer-controlled apparatus, a computerprocess, a computing system, or as an article of manufacture such as acomputer-readable medium. These and various other features will beapparent from a reading of the following Detailed Description and areview of the associated drawings.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intendedthat this Summary be used to limit the scope of the claimed subjectmatter. Furthermore, the claimed subject matter is not limited toimplementations that solve any or all disadvantages noted in any part ofthis disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing several example components forproviding topically aware word suggestions;

FIG. 2 is a flow diagram illustrating aspects of one illustrativeroutine for processing data used to provide topically aware wordsuggestions;

FIGS. 3A and 3B describe a routine that may be used during use of adevice storing and utilizing a correlation table;

FIG. 4 is an example of a structure showing correlations between wordsand word clusters;

FIG. 5 is a flow diagram illustrating aspects of an example routine forprocessing cluster data;

FIG. 6 is a computer architecture diagram illustrating an illustrativecomputer hardware and software architecture for a computing systemcapable of implementing aspects of the techniques and technologiespresented herein.

FIG. 7 is a diagram illustrating a distributed computing environmentcapable of implementing aspects of the techniques and technologiespresented herein.

FIG. 8 is a computer architecture diagram illustrating a computingdevice architecture for a computing device capable of implementingaspects of the techniques and technologies presented herein.

DETAILED DESCRIPTION

The technologies described herein provide topically aware wordsuggestions. In one aspect, a system is configured to receive an inputcontaining a plurality of words. Using the input, the system determinesa conditional count and an unconditional count. The system thendetermines an adjustment factor for a pair of words of the plurality ofwords based on the unconditional count and the conditional count. Thesystem then generates a data structure defining a plurality of wordclusters, where the individual word clusters of the plurality of wordclusters include at least one word of the plurality of words. The systemthen reconstructs the adjustment factor of the pair of words based on anumber of common clusters between individual words of the pair of words.The adjustment factor is combined with other data, such as data from alanguage model dictionary and a freshness factor from an average clusteractivation state table to determine a probability associated with a wordcandidate. One or more word candidates are displayed to a user based onthe probability.

The techniques described herein utilize data from a number of sources toprovide automatic inclusion of contextual awareness to a text input,which allows implementations to dynamically identify topics and provideword suggestions based on the topics. According to various embodiments,data structures may store usage data particular to an application and aperson. The data is used from both data structures to find a grouping oftopically relevant words. Based on one or more calculated probabilitiesa word candidate is selected from the grouping of topically relevantwords.

While the subject matter described herein is presented in the generalcontext of program modules that execute in conjunction with theexecution of an operating system and application programs on a computersystem, those skilled in the art will recognize that otherimplementations may be performed in combination with other types ofprogram modules. Generally, program modules include routines, programs,components, data structures, and other types of structures that performparticular tasks or implement particular abstract data types. Moreover,those skilled in the art will appreciate that the subject matterdescribed herein may be practiced with other computer systemconfigurations, including hand-held devices, multiprocessor systems,microprocessor-based or programmable consumer electronics,minicomputers, mainframe computers, and the like.

In the following detailed description, references are made to theaccompanying drawings that form a part hereof, and which are shown byway of illustration specific embodiments or examples. Referring now tothe drawings, in which like numerals represent like elements throughoutthe several figures, aspects of a computing system and methodology forproviding topically aware word suggestions will be described.

FIG. 1 is a system diagram showing aspects of one illustrative mechanismdisclosed herein for providing topically aware word suggestions. Asshown in FIG. 1, a system 100 may include a remote computer 101, acomputing device 110 and a network 120. The computing device 110 mayoperate as a stand-alone device, or the computing device 110 may operatein conjunction with the remote computer 101. As can be appreciated, theremote computer 101 and the computing device 110 are interconnectedthrough one or more local and/or wide area networks, such as the network120. It should be appreciated that many more network connections may beutilized than illustrated in FIG. 1.

The computing device 110 may include a local memory 180 that stores theinput data 103, a language model dictionary 113, an output 115 and otherdata described herein. The computing device 110 may also include aprogram module 111 configured to manage interactions between a user andthe computing device 110. The program module 111 may be in the form of agame application, an office productivity application, an operatingsystem component or any other application with features that interactwith the user via speech or text communication.

The computing device 110 might also include a speech module 113 that isconfigured to operate in conjunction with a microphone 116 and a speaker117. The speech module 113 may include mechanisms for converting userspeech into a computer-readable format, such as a text or binary format.As can be appreciated, the speech module 113 may include a number ofknown techniques for converting a user's voice to a computer-readableformat. Text may also be received from a user through the input device119, which may include any device for receiving text. This may include asoft keyboard on a display interface a hardware keyboard or any otherdevice.

The speech module 113 may also operate in conjunction with a predictionservice 107 on the remote computer 101 to capture and interpret speechinput received at the computing device 110. As can be appreciated, thespeech service 107 may utilize resources of a multiple-computer systemto translate, transcribe, or otherwise interpret any type of speechinput. The computing device 110 may also include an interface 118, whichmay be in the form of a visual display for communicating text andgraphics to the user. The computing device 110 may also include an inputdevice 119, which may be in the form of a keyboard or any other type ofhardware for receiving any form of user input to the program module 111.

In some illustrative examples, the program module 111 is a softwarecomponent of an operation system, an application that may include anygeneric function such as a word processing application and emailapplication, or the application may provide a specialty function, suchas a baseball application or a fantasy football application. The programmodule 111 may be configured to operate with the input device 119 and/orthe speech module 113 can provide text entries using a keyboard or byany other form of communication such as speech or movement gestures. Inembodiments such as those described above, techniques disclosed hereincan be utilized to enhance a user experience by suggesting words to auser while the user is entering text. As described in more detail below,the techniques described herein identify contextually relevant topicsand words related to the identified topics. As can be appreciated, theexamples of the program module 111 described above are provided forillustrative purposes and are not to be construed as limiting.

The remote computer 101 may be in the form of a server computer or anumber of server computers configured to store the input data 103, alanguage model dictionary 113, an output 115 and other informationassociated with the user and related applications. As can beappreciated, the remote computer 101 may store duplicate copies of thedata stored on the computing device 110 allowing a centralized serviceto coordinate a number of client computers, such as the computing device110.

Turning now to FIG. 2, aspects of a routine 200 for providing topicallyaware word suggestions are shown and described below. It should also beunderstood that the operations disclosed herein are not necessarilypresented in any particular order and that performance of some or all ofthe operations in an alternative order(s) is possible and iscontemplated. The operations have been presented in the demonstratedorder for ease of description and illustration. Operations may be added,omitted, and/or performed simultaneously, without departing from thescope of the appended claims.

It also should be understood that the illustrated methods can be endedat any time and need not be performed in their entirety. Some or alloperations of the methods, and/or substantially equivalent operations,can be performed by execution of computer-readable instructions includedon a computer-storage media, as defined below. The term“computer-readable instructions,” and variants thereof, as used in thedescription and claims, is used expansively herein to include routines,applications, application modules, program modules, programs,components, data structures, algorithms, and the like. Computer-readableinstructions can be implemented on various system configurations,including single-processor or multiprocessor systems, minicomputers,mainframe computers, personal computers, hand-held computing devices,microprocessor-based, programmable consumer electronics, combinationsthereof, and the like.

Thus, it should be appreciated that the logical operations describedherein may be implemented (1) as a sequence of computer implemented actsor program modules running on a computing system and/or (2) asinterconnected machine logic circuits or circuit modules within thecomputing system. The implementation is a matter of choice dependent onthe performance and other requirements of the computing system.Accordingly, the logical operations described herein are referred tovariously as states, operations, structural devices, acts, or modules.These operations, structural devices, acts, and modules may beimplemented in software, in firmware, in special purpose digital logic,and any combination thereof.

As will be described in more detail below, in conjunction with FIG. 6,the operations of the routine 200 and other routines are describedherein as being implemented, at least in part, by a program module, suchas the program module 111 or other application programs shown in FIG. 6.Although the following illustration refers to the program module 111, itcan be appreciated that the operations of the routine 200 may also beimplemented in many other ways. For example, the routine 200 may beimplemented by the use of a combination of program modules operating onboth a client and a server. For example, one or more of the operationsof the routine 200 may alternatively or additionally be implemented, atleast in part, by the remote computer 101 hosting a service forproviding text suggestions.

With reference to FIG. 2, the routine 200 begins at operation 202, wherethe program module 111 obtains input text data 103 that is used as asample set. The input text data 103 may be in any format and may be fromany resource. For example, the input text data 103 may be text filesfrom an email system, an authoring application or any other applicationthat may store, process or generate text. The input text data 103 mayalso include text from a specialty application, such as a baseballapplication. The input text data 103 may also include text associatedwith a particular user. The input text data 103 may be in any size. Insome configurations, files or text chunks may include 14 words, and thesystem 100 may receive a number of these chunks and/or files.

As will be described herein, in some scenarios, the text does not haveto be in a particular order. For instance, a specific sequence of wordsdoes not need to follow the sentences of an email for use with thetechniques described herein. For instance, in some configurations, theinput may be broken up into blocks of text. In such configurations, theblocks of text may be processed in any order. Regardless of the order ofthe blocks of text, techniques described herein illustrate how words areassociated with a topic to produce a probability used for predictingwords and providing word candidates. In addition, in someconfigurations, the sequence of words in the input text data 103 may bepreserved for further processing. For instance, as described below, theinput text data 103 may use a specific sequence of words to determinevalues, such as the conditional probabilities and other values.

Next, in operation 204, the system 100 determines an unconditional countfor the words of the input text data 103. The unconditional count is araw count of the words regardless of the ordering or context. In someconfigurations, the unconditional count may be derived from any sourceof data such as text blocks that come from applications or userprofiles, for example.

TABLE 1 UNCONDITIONAL COUNT it was the best of times 1 1 1 1 1 1

Then, in operation 206, the system 100 determines a conditional countfor word pairs found in the input text data 103. In some configurations,this a raw count that considers context. Specifically in one example, itis a count of how many times a word appears after another word in thesame block of text. In TABLE 2, for example, the count is how many times“best” shows up after “it.”

TABLE 2 CONDITIONAL COUNT it, best 1 it, of 1 it, times 1 was, of 1 was,times 1 the, times 1

Next, at operation 208, the system 100 determines a value that indicatesa correlation between two words in the input text data 103. Thecorrelation between two words is quantified by a value referred toherein as an “adjustment factor,” which is the change of probability ofthe one word given that we have the other word.

The adjustment factor may be calculated using a number of differenttechniques. For instance, the correlation between the words “best” and“times” may have an adjustment factor that is based on a process thatcombines at least two noise filters. One or more technologies can beused for combining noise filters, including a technology referred to asdiscounting. Although the disclosure herein describes certain ways todetermine an adjustment factor, there may be a number of techniques fordetermining this value. For example, techniques described herein mayutilize any technique for determining the adjustment factor based on theconditional count and the unconditional count.

As described herein, configurations may utilize any technique fordetermining a conditional probability by applying the adjustment factorto the unconditional probability. This allows for the contextualawareness of a related topic. The determined adjustment factor may beassociated with word pairs and stored in a data structure having anyformat. For illustrative purposes, the data structure storing word pairswith the adjustment factor is referred to herein as a “correlationtable.”

Next, at operation 210, the system 100 determines a number of wordclusters. In general, from the input text data 103, word combinationsare grouped. The groupings may include any number of words, e.g., oneword up to a larger number exceeding thousands of words. FIG. 4illustrates an example of a number of word clusters 301-305. Forillustrated purposes, each oval in dashed lines represent a wordcluster. In this example, the first word cluster 301 includes the word“it” and “was,” the second word cluster 303 includes the words “best”and “times,” and the third cluster 305 includes the words “was,” “best,”and “times.” Of course, there may be more clusters than shown herein,these are provided for illustrative purposes.

Also shown in FIG. 4, a word correlation model 300 is provided to showthe correlations between the words of the input text data 103. Forexample, this representation illustrates the relationships between wordsand a corresponding adjustment factor between each word. For example,the adjustment factor associated with the word pair “it” and “was” isequal to one (1), the adjustment factor associated with the word pair“best” and “times” is equal to two (2), and the adjustment factorassociated with the word pair “was” and “times” is equal to two and half(2.5), etc.

One example data structure representing the clusters may be representedby a table of data having cluster identifiers (“cluster IDs”) associatedwith each word in the clusters. For example a table may include thefollowing structure.

TABLE 3 WORD List of Cluster (IDs) containing the word Best 1, 17, 250,117, 32 times 17, 214, 112, 1, 20 was 12, 34, 23, 18, 20

In this example, the word “best” is in four clusters with the IDs of “1,17, 250, 117” and the word “times” is in four clusters with the IDs of“17, 214, 112, 1.”

In some configurations, the process of determining the clusters alsoinvolves a process of filtering data associated with the clusters. Ingeneral, the filtering process involves ranking data associated with theclusters and filtering the data that does not meet a threshold.

In some configurations, the adjustment factor that is associated withword pairs is used to determine a cluster density. A number of differenttechniques may be used to determine a cluster density, which representsthe relevancy of each cluster. For example, the cluster density may bean average of each adjustment factor for words in a particular cluster.In some configurations, the cluster density is used to rank the clusteragainst other clusters. As described herein, those rankings are used bytechniques herein to sort priority of clusters for relevancy. Forexample, the cluster density of the third cluster shown in FIG. 4 wouldbe 1+2.5+2/3=2.17, given that the words in this cluster are “was,”“best” and “times.”

In some configurations, the clusters may be ranked by the use of otherdata. One example illustrating such configurations is shown on TABLE 3.As shown, the clusters are actually ranked by an average correlationbetween the word in question and the words contained in the cluster. Forexample, the average correlation between the word “times” and eachindividual word in cluster 17 is higher than the average correlationbetween the word “times” and the words in any other single cluster. Thistechnique may be used as a distinct process from the techniques usingthe cluster density. As described herein, techniques using the clusterdensity compares all of the words in a cluster with one another, ratherthan comparing one word (which may or may not be in a cluster) with allof the words in the cluster. In some configurations, the techniquesusing the average correlation between a word and individual words in theclusters may be used in conjunction with the techniques using thecluster density.

In addition, data of TABLE 3 may be filtered based on this generateddata. For instance, the number of clusters associated with each word maybe filtered based on the cluster density meeting a threshold. Thus,instead of using an exhaustive list of cluster IDs, computing resourcesand potentially network bandwidth may be saved by having this filteredversion of the dataset. One or more techniques for processing thecluster data can be utilized. An illustrative example is shown in FIG. 5and described in more detail below.

Next, at operation 212, the system 100 reconstructs the adjustmentfactor based on the determined cluster data. In some configurations, thenumber of correlations between two words may be used to reconstruct theadjustment factor. For example, in TABLE 3, the adjustment factor forthe word pair “best” and “times” may be reconstructed based on the factthat there are two common clusters between the words, e.g., cluster 1and cluster 17 are common clusters between these words.

In some configurations, as an optional feature, the reconstruction ofthe adjustment factor may be based on the ranking of the clusters ofeach word. In the current example of TABLE 3 involving the word pair“best” and “times,” not only can the number of common clusters be usedto reconstruct the adjustment factor, the position of the correlatingclusters may be used. For example, the cluster ID=17 and cluster ID=1are ranked relatively high, thus, this ranking may have more impact thancluster ID=20, which is ranked relatively low.

The techniques disclosed herein for determining a reconstructedadjustment factor, e.g., a correlation, for a word pair may use anyprocess for determining a correlation or quantifiable relationship forany word pair using cluster data and/or data representing a clusterdensity. In operation 212, the reconstructed adjustment factors and theassociated word pairs may be stored in a data structure, e.g., an output115, of the process. One example of a correlation table having originaladjustment factors is shown in TABLE 4. In this example, the value of5491 increases the probability of occurrence of the word “best” based onthe appearance of “it” by about a factor one million times. An exampleof related techniques involving the determination of an activationcoefficient is described below and summarized above.

TABLE 4 CORRELATION TABLE it, best 5491.4153671374806 it, of5402.7723901902709 it, times 5301.2354238547159 was, of5280.9571079257323 was, times 5190.6010198867662 the, times5168.6583668275723

As described in other sections herein, the data of the correlationtable, the output 115, may be used by one or more techniques foradjusting a freshness value and one or more probabilities associatedwith word candidates. Thus, after operation 212, the routine 200 maytransition into another routine described herein or data produced byroutine 200 may be used by other routines and/or techniques describedherein.

Generally described, FIG. 3A and FIG. 3B described routines that may beused during use of a device storing and utilizing a correlation table.Specifically, FIG. 3A is an example route 250 for generating a wordcandidate probability and suggesting a word candidate based on theprobability when part of a word entered by a user. FIG. 3B is an exampleroute 275 for updating data, such as a freshness value, based on thereceipt of a full word.

Although these routines are shown in two separate diagrams, it can beappreciated that these techniques can be combined and run in the sameprogram, and run in parallel. Thus, while a user is entering charactersof an incomplete word, operations of routine 250 are used to find theword candidates. When a word candidate is selected or a full word istyped and entered, operations of routine 275 are used to update datathat is used to suggest word candidates. By providing the benefits ofboth routines, contextually relevant topics may be identified while auser is entering text.

FIG. 3A illustrates an example process for providing word suggestionsfor text entries with partially complete words. The routine starts atoperation 251, where a device receives input text entry. As charactersare entered, the characters are processed to generate word candidatesusing the techniques described herein.

When characters are received, the routine 250 proceeds at operation 253where the system 100 determines one or more words associated with thetext entry. For example, if the characters “P” and “I” are entered, thatpattern is searched in one or more resources having words with thecharacter combination of the input. In one example, a list of words maycome from a dictionary or a language model dictionary. For example, anentry of “P” and “I” may return raw candidates from a dictionary ordatabase, which may return the words “pie,” “piece,” “pire,” “pit,”“pizza,” and “pine.” For illustrative purposes, the words determined inoperation 253 are also referred to herein as “raw candidates.”

Next, at operation 255, the system 100 identifies clusters associatedwith the raw candidates. The processing of operation 255 may include asearch for the raw candidates within a data structure representing anumber of clusters. The output of this operation may include a number ofcluster IDs associated with the words. In the above example, the system100 would produce cluster IDs for clusters containing the words “pie,”“piece,” “pire,” “pit,” “pizza,” and “pine.”

Next at 257, the system 100 obtains a freshness factor based on theclusters determined in operation 253. In some configurations, thefreshness factor may be derived from a data structure, such as theCluster Activation State Table shown in TABLE 5.

TABLE 5 Cluster Activation State Table Cluster ID Freshness Value Words0 −> 0.0 [word 1, word 2, . . .] 1 −> 1.5 [word 3, word 4, . . .] 2 −>−2.0 [word 5, word 6, . . .] 3 −> −1.5 [word 7, word 8, . . .] 4 −> −3.0[word 9, word 10, . . .] . . . 256  −> −6.0 [word (n − 1), word n, . ..]

In this example, the data structure defining the cluster activationstate may include a cluster ID and an associated freshness value. Thedata structure may also include the words or pointers to the words ofthe individual clusters. As will be described below, the freshness valueindicates how recent a word of a cluster was entered into a device.Thus, in general, the freshness value will identify clusters havingrecently used words. In operation 257, by the use of the cluster IDsobtained in operation 255, the freshness value for each cluster isdetermined.

Next, at operation 259, the system 100 determines an adjustment factorbased on the freshness value. The adjustment factor determined inoperation 259 may be determined using any suitable technique that isbased on the freshness value. In one example, for a particular rawcandidate word, the freshness values for all associated clusters may besummed. In another example, the cluster activation state table mayinclude a number of coefficients that may be applied as a multiplier tovalues used to determine the adjustment factor.

Then at operation 261, the system 100 obtains a language model valuefrom a language dictionary or another resource. In one example, ageneral text prediction dictionary assigns a probability to every word.In some configurations, this probability, which is referred to herein asa “usage value” and a “language model value” is a raw probability inwhich words are universally used in a null context. Based on the rawword candidates obtained in operation 253, the associated language modelvalues may be obtained.

Then at operation 265, the system 100 determines a candidate probabilitybased on the language model value and the adjustment factor. Anytechnique for combining these values to determine a probability may beused in operation 265.

Next, at operation 267, the system 100 produces an output displaying aword candidate based on the candidate probability. It can be appreciatedthat some or all operations of routine 250 may be repeated to obtain acandidate probability for multiple word candidates. Multiple wordcandidates may be then displayed to a user. The display may sort theword candidates based on the candidate probability for individual wordcandidates, with the sorting positioning word candidates with thehighest candidate probability near the beginning of a listing.

Now turning to FIG. 3B, details of the example route 275 for updatingdata, such as a freshness value, are described. As summarized above,when a word candidate is selected or a full word is typed and entered,operations of routine 275 are used to update data that is used tosuggest word candidates.

The routine starts at operation 276, where a device receives an inputincluding a full word. As noted above, this part of the process couldinclude a user input where the user types in the full word or where theuser selects a full word based on a suggestion. Any form of input may beused in this operation including text received from another machine. Theinput may be from a keyboard and/or a gesture-based technology involvingspeech and/or movements of a user.

Next, at operation 278, a data structure defining a cluster activationstate is updated based on the input. With reference to TABLE 5, a datastructure defining the cluster activation state may include a cluster IDand an associated freshness value. The data structure may also includethe words, or pointers to the words, of the individual clusters.

In operation 278, as a word is indicated, entered or selected by theinput, a cluster containing the word is raised in priority. For example,with reference to TABLE 5, the freshness value of cluster ID=3 may bemodified, if [word 7] is received. The freshness value may be modifiedto a value indicating that the associated cluster is more relevant orcurrent. Any value or technique for prioritizing clusters based on thetiming of an input including an associated word may be used in operation278.

In addition to adjusting the freshness value when a word is received,the system 100 may also continually modify the freshness values of theCluster Activation State Table back to a normal point, e.g., a value ofzero, over a period of time. This decay of the freshness value helps thesystem 100 monitor usage trends of certain words and helps distinguishcurrent topics over topics that have not been raised recently.

In some configurations, techniques disclosed herein identify a defaultcontext, which is also referred to herein as a default topic. Ingeneral, the selection of a default context may be based on datastructures defining cluster profiles. In one aspect, a cluster profilerepresents an average cluster activation state in a particular context.For instance, a cluster of profile for a particular application, such asa fantasy football program, may store relevant data, such as the averagecluster activation state, for each person using the application. Inaddition to storing a cluster profile for a particular application, thesystem 100 may store and update average cluster activation state datarelevant to individual users.

Thus, when a person uses the application, the techniques describedherein may utilize the cluster profile for the application and thecluster profile for the user. The data from each cluster profile may beaveraged to identify and/or generate data that can be used to identifyword candidates.

These examples are provided for illustrative purposes only and are notto be construed as limiting. As any type of application may have anassociated cluster profile. For instance, a cluster profile may bemaintained for an email program for all users. As can be appreciated,the freshness value of the average cluster activation state data storemay be updated as users provide text entries. Context related to theapplication and/or context related to the user may help identify morerelevant topics and/or more relevant word candidates. The techniquesdescribed herein may access the one or more cluster profiles to obtain,update and/or generate a freshness value, which is used to obtain wordcandidates.

When utilizing clusters as described herein, there can be two stages inthe process where values are utilized. For example, as described above,when a user types a word in a document, the techniques disclosed hereincan change the activation coefficient. Then, when the word is typed asubsequent time, the techniques disclosed herein references theactivation coefficient.

A value can be used to quantify an association of a word with a cluster,this is also referred to herein as the “activation coefficient.” Forexample, a word can have a “strong” association with a cluster, or a“weak” association with a cluster. In one specific example, if a wordthat is typed earlier in a document is strong with respect to a clusterand a subsequently typed word is strong with respect to the samecluster, then the effect is strong. In another example, if a word thatis typed earlier in a document is weak with respect to a cluster and asubsequently typed word is weak with respect to the same cluster, thenthe effect is weak. In yet another example, if a word that is typedearlier in a document is strong with respect to a cluster and asubsequently typed word is weak within the same cluster, then the effectis somewhere between strong and weak. The equations described hereinenable the combination of two or more values maintaining the same scaleas the original value. The examples are provided for illustrativepurposes and are not to be construed as limiting, as the two values canbe combined in other ways, some of which may include the multiplicationof the two values.

Now referring to FIG. 5, a flow diagram illustrating aspects of anexample routine for processing cluster data is shown and describedbelow. In some cases, clusters that are created by the techniquesdisclosed herein may be subject to further processing since they canhave the property of having only one cluster per word. In oneillustrative example, consider a sample data set where each word hasfour cluster slots.

The routine starts at operation 501 where a computing device sorts theclusters by a correlation. For example, given a word, a computer cansort a list of clusters in descending order of average correlation withthat word. The clusters can be sorted into a list.

Next, at operation 503, a slot and a threshold can be established. Inthis example, the slot is set to 1 and the inclusion threshold is set toeighty (80). These values are provided for illustrative purposes and arenot to be construed as limiting as other suitable value can be used.Then, at operation 504, the computing device retrieves the top clusteroff the list.

Next, at operation 505, computing device determines if the correlationis greater than the threshold. If the correlation is not greater thanthe threshold, the routine 500 proceeds to operation 507 where theroutine 500 proceeds to the next slot and lowers the threshold. Theexample shown in operation 507 is for illustrative purposes, thethreshold can be reduced using any suitable technique. Next, atoperation 509, the computing device determines if there are anyremaining slots. If there are remaining slots, the routine 500 returnsto operation 505. If there are no more remaining slots, the routine 500terminates.

At operation 505, if the computing device determines that thecorrelation is greater than the threshold, the routine 500 proceeds tooperation 511 where the computing device fills the slot with thecluster. Following operation 511, at operation 513, the computing devicemoves to the next slot. In this example, the slot is incremented and thethreshold is lowered. Again, the example of FIG. 5 is provided forillustrative purposes, the threshold can be reduced by any suitablevalue.

Next, at operation 515, the computing device determines if there are anyremaining slots. If there are remaining slots, the routine 500 returnsto operation 504 where the next cluster is retrieved from the list. Ifthere are no more remaining slots, the routine 500 terminates.

Using the example described above and shown in FIG. 5, an exampleinvolving the determination of an activation coefficient is describedbelow. In this example, the language model maintains a vectorrepresenting the “activation state” of each cluster. The premise is thatthe use of a word or a multi-word entity (MWE) should “activate” each ofthe clusters that contains that word or MWE, and subsequentlysuggestions in more-active clusters should be preferred over suggestionsin less-active clusters. For illustrative purposes, an MWE refers to ann-gram that is a meaningful unit, and the meaning of which is notstrongly related to the meaning of its individual words (e.g. “heartattack,” “hot dog”).

Also, in this example, the computing device is configured to maintain acircular buffer of the last 50 words that are typed. Thus, when the 51stword is typed, the computing device can reverse the cluster activationof the 1st word before the computing device activates the 51st word'sclusters. There are multiple ways to handle MWEs, but techniques canreverse the activation for the individual words that make up a MWE whenwe detect that a MWE has been typed. For example, the meaning of “hotdog” is completely semantically unrelated to the meanings of “hot” and“dog.” This example is provided for illustrative purposes and is not tobe construed as limiting. Other techniques may be used to control theactivation so it does not grow without bound. For example, instead ofthe reverse method described above, an activation can decay over time.In one illustrative example, each time a word is typed, the activationcan be reduced by a predetermined amount, e.g., two percent or anotheramount.

The cluster membership ranks from various techniques, including theexample provided above, can determine both how much clusters areactivated when a word is typed, and how much a suggested word's cost isaffected by the activation level of its clusters. For example, when aword is typed the following routine can be used:

for each slot:

if the word has a cluster in this slot:

-   -   boost the activation of the cluster by the activation        coefficient associated with this slot        When the language model generates a candidate:

for each slot:

if the candidate has a cluster in this slot:

-   -   adjust the cost of the candidate by the activation coefficient        associated with this slot multiplied by the current activation        of the cluster

The “activation coefficient” can be determined for each slot as follows.With reference to the example above and shown in FIG. 5, each slot canhave a certain average correlation threshold for a cluster to be placedthere, e.g., 80, 40, 20, and 10 respectively. It can be appreciated thata typical correlation of a word with a cluster in a particular slot willbe halfway between the threshold and the next highest threshold, e.g.,multiply by 3/2. In some configurations, a result can come from takingthe square root, because a calculation can multiply by this coefficientboth when the cluster is activated and when a cost of a candidate isadjusted.

${A\; C_{slot}} = \sqrt{\frac{3}{2}{threshold}_{slot}}$

For example, given a word with all four cluster slots filled, typing itonce can reduce the cost of typing it a second time by

${{\Sigma_{slot}( \sqrt{\frac{3}{2}{threshold}_{slot}} )}^{2} = {{\frac{3}{2}( {80 + 40 + 20 + 10} )} = {{225\log} - {{prob}\mspace{14mu} {points}}}}},$

in a particular implementation with a given scale, could correspond toroughly doubling its probability of appearing in the next 50 words.

In some configurations, a process can normalize the activation stateacross all clusters, so that the sum of the activation of all clustersis always zero. Every time a cluster is activated by a certain amount,all other clusters are deactivated by a small amount in order tomaintain this invariant. For example, if a process activates one cluster(out of 255) by “x” points, the process would adjust the activation ofall other clusters by “−X/254” points. If not for this, it would producea probability advantage for a word to appear in any cluster, so theeffective average probability of very common words which do not appearin any clusters (such as “the”) would decrease for no principled reason.By allowing some clusters to have a negative activation state, thetechniques herein tell the language model to prefer common unclusteredwords over words in these clusters.

FIG. 6 shows additional details of an example computer architecture 600for a computer, such as the computing device 101 (FIG. 1), capable ofexecuting the program components described above for providing topicallyaware word suggestions. Thus, the computer architecture 600 illustratedin FIG. 6 illustrates an architecture for a server computer, mobilephone, a PDA, a smart phone, a desktop computer, a netbook computer, atablet computer, and/or a laptop computer. The computer architecture 600may be utilized to execute any aspects of the software componentspresented herein.

The computer architecture 600 illustrated in FIG. 6 includes a centralprocessing unit 602 (“CPU”), a system memory 604, including a randomaccess memory 606 (“RAM”) and a read-only memory (“ROM”) 608, and asystem bus 610 that couples the memory 604 to the CPU 602. A basicinput/output system containing the basic routines that help to transferinformation between elements within the computer architecture 600, suchas during startup, is stored in the ROM 608. The computer architecture600 further includes a mass storage device 612 for storing an operatingsystem 607, and one or more application programs including, but notlimited to, a program module 111 and an output 115.

The mass storage device 612 is connected to the CPU 602 through a massstorage controller (not shown) connected to the bus 610. The massstorage device 612 and its associated computer-readable media providenon-volatile storage for the computer architecture 600. Although thedescription of computer-readable media contained herein refers to a massstorage device, such as a solid state drive, a hard disk or CD-ROMdrive, it should be appreciated by those skilled in the art thatcomputer-readable media can be any available computer storage media orcommunication media that can be accessed by the computer architecture600.

Communication media includes computer readable instructions, datastructures, program modules, or other data in a modulated data signalsuch as a carrier wave or other transport mechanism and includes anydelivery media. The term “modulated data signal” means a signal that hasone or more of its characteristics changed or set in a manner as toencode information in the signal. By way of example, and not limitation,communication media includes wired media such as a wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Combinations of the any of the aboveshould also be included within the scope of computer-readable media.

By way of example, and not limitation, computer storage media mayinclude volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage of information suchas computer-readable instructions, data structures, program modules orother data. For example, computer media includes, but is not limited to,RAM, ROM, EPROM, EEPROM, flash memory or other solid state memorytechnology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, orother optical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed bythe computer architecture 600. For purposes the claims, the phrase“computer storage medium,” “computer-readable storage medium” andvariations thereof, does not include waves, signals, and/or othertransitory and/or intangible communication media, per se.

According to various configurations, the computer architecture 600 mayoperate in a networked environment using logical connections to remotecomputers through the network 756 and/or another network (not shown).The computer architecture 600 may connect to the network 756 through anetwork interface unit 614 connected to the bus 610. It should beappreciated that the network interface unit 614 also may be utilized toconnect to other types of networks and remote computer systems. Thecomputer architecture 600 also may include an input/output controller616 for receiving and processing input from a number of other devices,including a keyboard, mouse, or electronic stylus (not shown in FIG. 6).Similarly, the input/output controller 616 may provide output to adisplay screen, a printer, or other type of output device (also notshown in FIG. 6).

It should be appreciated that the software components described hereinmay, when loaded into the CPU 602 and executed, transform the CPU 602and the overall computer architecture 600 from a general-purposecomputing system into a special-purpose computing system customized tofacilitate the functionality presented herein. The CPU 602 may beconstructed from any number of transistors or other discrete circuitelements, which may individually or collectively assume any number ofstates. More specifically, the CPU 602 may operate as a finite-statemachine, in response to executable instructions contained within thesoftware modules disclosed herein. These computer-executableinstructions may transform the CPU 602 by specifying how the CPU 602transitions between states, thereby transforming the transistors orother discrete hardware elements constituting the CPU 602.

Encoding the software modules presented herein also may transform thephysical structure of the computer-readable media presented herein. Thespecific transformation of physical structure may depend on variousfactors, in different implementations of this description. Examples ofsuch factors may include, but are not limited to, the technology used toimplement the computer-readable media, whether the computer-readablemedia is characterized as primary or secondary storage, and the like.For example, if the computer-readable media is implemented assemiconductor-based memory, the software disclosed herein may be encodedon the computer-readable media by transforming the physical state of thesemiconductor memory. For example, the software may transform the stateof transistors, capacitors, or other discrete circuit elementsconstituting the semiconductor memory. The software also may transformthe physical state of such components in order to store data thereupon.

As another example, the computer-readable media disclosed herein may beimplemented using magnetic or optical technology. In suchimplementations, the software presented herein may transform thephysical state of magnetic or optical media, when the software isencoded therein. These transformations may include altering the magneticcharacteristics of particular locations within given magnetic media.These transformations also may include altering the physical features orcharacteristics of particular locations within given optical media, tochange the optical characteristics of those locations. Othertransformations of physical media are possible without departing fromthe scope and spirit of the present description, with the foregoingexamples provided only to facilitate this discussion.

In light of the above, it should be appreciated that many types ofphysical transformations take place in the computer architecture 600 inorder to store and execute the software components presented herein. Italso should be appreciated that the computer architecture 600 mayinclude other types of computing devices, including hand-held computers,embedded computer systems, personal digital assistants, and other typesof computing devices known to those skilled in the art. It is alsocontemplated that the computer architecture 600 may not include all ofthe components shown in FIG. 6, may include other components that arenot explicitly shown in FIG. 6, or may utilize an architecturecompletely different than that shown in FIG. 6.

FIG. 7 depicts an illustrative distributed computing environment 700capable of executing the software components described herein forproviding topically aware word suggestions. Thus, the distributedcomputing environment 700 illustrated in FIG. 7 can be utilized toexecute any aspects of the software components presented herein. Forexample, the distributed computing environment 700 can be utilized toexecute aspects of the web browser 610, the content manager 105 and/orother software components described herein.

According to various implementations, the distributed computingenvironment 700 includes a computing environment 702 operating on, incommunication with, or as part of the network 704. The network 704 maybe or may include the network 756, described above with reference toFIG. 5. The network 704 also can include various access networks. One ormore client devices 706A-706N (hereinafter referred to collectivelyand/or generically as “clients 706”) can communicate with the computingenvironment 702 via the network 704 and/or other connections (notillustrated in FIG. 7). In one illustrated configuration, the clients706 include a computing device 706A such as a laptop computer, a desktopcomputer, or other computing device; a slate or tablet computing device(“tablet computing device”) 706B; a mobile computing device 706C such asa mobile telephone, a smart phone, or other mobile computing device; aserver computer 706D; and/or other devices 706N. It should be understoodthat any number of clients 706 can communicate with the computingenvironment 702. Two example computing architectures for the clients 706are illustrated and described herein with reference to FIGS. 6 and 8. Itshould be understood that the illustrated clients 706 and computingarchitectures illustrated and described herein are illustrative, andshould not be construed as being limited in any way.

In the illustrated configuration, the computing environment 702 includesapplication servers 708, data storage 710, and one or more networkinterfaces 712. According to various implementations, the functionalityof the application servers 708 can be provided by one or more servercomputers that are executing as part of, or in communication with, thenetwork 704. The application servers 708 can host various services,virtual machines, portals, and/or other resources. In the illustratedconfiguration, the application servers 708 host one or more virtualmachines 714 for hosting applications or other functionality. Accordingto various implementations, the virtual machines 714 host one or moreapplications and/or software modules for providing topically aware wordsuggestions. It should be understood that this configuration isillustrative, and should not be construed as being limiting in any way.The application servers 708 also host or provide access to one or moreportals, link pages, Web sites, and/or other information (“Web portals”)716.

According to various implementations, the application servers 708 alsoinclude one or more mailbox services 718 and one or more messagingservices 720. The mailbox services 718 can include electronic mail(“email”) services. The mailbox services 718 also can include variouspersonal information management (“PIM”) services including, but notlimited to, calendar services, contact management services,collaboration services, and/or other services. The messaging services720 can include, but are not limited to, instant messaging services,chat services, forum services, and/or other communication services.

The application servers 708 also may include one or more socialnetworking services 722. The social networking services 722 can includevarious social networking services including, but not limited to,services for sharing or posting status updates, instant messages, links,photos, videos, and/or other information; services for commenting ordisplaying interest in articles, products, blogs, or other resources;and/or other services. In some configurations, the social networkingservices 722 are provided by or include the FACEBOOK social networkingservice, the LINKEDIN professional networking service, the MYSPACEsocial networking service, the FOURSQUARE geographic networking service,the YAMMER office colleague networking service, and the like. In otherconfigurations, the social networking services 722 are provided by otherservices, sites, and/or providers that may or may not be explicitlyknown as social networking providers. For example, some web sites allowusers to interact with one another via email, chat services, and/orother means during various activities and/or contexts such as readingpublished articles, commenting on goods or services, publishing,collaboration, gaming, and the like. Examples of such services include,but are not limited to, the WINDOWS LIVE service and the XBOX LIVEservice from Microsoft Corporation in Redmond, Wash. Other services arepossible and are contemplated.

The social networking services 722 also can include commenting,blogging, and/or micro blogging services. Examples of such servicesinclude, but are not limited to, the YELP commenting service, the KUDZUreview service, the OFFICETALK enterprise micro blogging service, theTWITTER messaging service, the GOOGLE BUZZ service, and/or otherservices. It should be appreciated that the above lists of services arenot exhaustive and that numerous additional and/or alternative socialnetworking services 722 are not mentioned herein for the sake ofbrevity. As such, the above configurations are illustrative, and shouldnot be construed as being limited in any way. According to variousimplementations, the social networking services 722 may host one or moreapplications and/or software modules for providing the functionalitydescribed herein for providing topically aware word suggestions. Forinstance, any one of the application servers 708 may communicate orfacilitate the functionality and features described herein. Forinstance, a social networking application, mail client, messaging clientor a browser running on a phone or any other client 706 may communicatewith a networking service 722 and facilitate the functionality, even inpart, described above with respect to FIG. 4.

As shown in FIG. 7, the application servers 708 also can host otherservices, applications, portals, and/or other resources (“otherresources”) 724. The other resources 724 can include, but are notlimited to, document sharing, rendering or any other functionality. Itthus can be appreciated that the computing environment 702 can provideintegration of the concepts and technologies disclosed herein providedherein with various mailbox, messaging, social networking, and/or otherservices or resources.

As mentioned above, the computing environment 702 can include the datastorage 710. According to various implementations, the functionality ofthe data storage 710 is provided by one or more databases operating on,or in communication with, the network 704. The functionality of the datastorage 710 also can be provided by one or more server computersconfigured to host data for the computing environment 702. The datastorage 710 can include, host, or provide one or more real or virtualdatastores 726A-726N (hereinafter referred to collectively and/orgenerically as “datastores 726”). The datastores 726 are configured tohost data used or created by the application servers 708 and/or otherdata. Although not illustrated in FIG. 7, the datastores 726 also canhost or store web page documents, word documents, presentationdocuments, data structures, algorithms for execution by a recommendationengine, and/or other data utilized by any application program or anothermodule, such as the content manager 105. Aspects of the datastores 726may be associated with a service for storing files.

The computing environment 702 can communicate with, or be accessed by,the network interfaces 712. The network interfaces 712 can includevarious types of network hardware and software for supportingcommunications between two or more computing devices including, but notlimited to, the clients 706 and the application servers 708. It shouldbe appreciated that the network interfaces 712 also may be utilized toconnect to other types of networks and/or computer systems.

It should be understood that the distributed computing environment 700described herein can provide any aspects of the software elementsdescribed herein with any number of virtual computing resources and/orother distributed computing functionality that can be configured toexecute any aspects of the software components disclosed herein.According to various implementations of the concepts and technologiesdisclosed herein, the distributed computing environment 700 provides thesoftware functionality described herein as a service to the clients 706.It should be understood that the clients 706 can include real or virtualmachines including, but not limited to, server computers, web servers,personal computers, mobile computing devices, smart phones, and/or otherdevices. As such, various configurations of the concepts andtechnologies disclosed herein enable any device configured to access thedistributed computing environment 700 to utilize the functionalitydescribed herein for providing topically aware word suggestions, amongother aspects. In one specific example, as summarized above, techniquesdescribed herein may be implemented, at least in part, by the webbrowser application 510 of FIG. 6, which works in conjunction with theapplication servers 708 of FIG. 7.

Turning now to FIG. 8, an illustrative computing device architecture 800for a computing device that is capable of executing various softwarecomponents described herein for providing topically aware wordsuggestions. The computing device architecture 800 is applicable tocomputing devices that facilitate mobile computing due, in part, to formfactor, wireless connectivity, and/or battery-powered operation. In someconfigurations, the computing devices include, but are not limited to,mobile telephones, tablet devices, slate devices, portable video gamedevices, and the like. The computing device architecture 800 isapplicable to any of the clients 706 shown in FIG. 7. Moreover, aspectsof the computing device architecture 800 may be applicable totraditional desktop computers, portable computers (e.g., laptops,notebooks, ultra-portables, and netbooks), server computers, and othercomputer systems, such as described herein with reference to FIG. 6. Forexample, the single touch and multi-touch aspects disclosed herein belowmay be applied to desktop computers that utilize a touchscreen or someother touch-enabled device, such as a touch-enabled track pad ortouch-enabled mouse.

The computing device architecture 800 illustrated in FIG. 8 includes aprocessor 802, memory components 804, network connectivity components806, sensor components 808, input/output components 810, and powercomponents 812. In the illustrated configuration, the processor 802 isin communication with the memory components 804, the networkconnectivity components 806, the sensor components 808, the input/output(“I/O”) components 810, and the power components 812. Although noconnections are shown between the individuals components illustrated inFIG. 8, the components can interact to carry out device functions. Insome configurations, the components are arranged so as to communicatevia one or more busses (not shown).

The processor 802 includes a central processing unit (“CPU”) configuredto process data, execute computer-executable instructions of one or moreapplication programs, and communicate with other components of thecomputing device architecture 800 in order to perform variousfunctionality described herein. The processor 802 may be utilized toexecute aspects of the software components presented herein and,particularly, those that utilize, at least in part, a touch-enabledinput.

In some configurations, the processor 802 includes a graphics processingunit (“GPU”) configured to accelerate operations performed by the CPU,including, but not limited to, operations performed by executinggeneral-purpose scientific and/or engineering computing applications, aswell as graphics-intensive computing applications such as highresolution video (e.g., 720P, 1080P, and higher resolution), videogames, three-dimensional (“3D”) modeling applications, and the like. Insome configurations, the processor 802 is configured to communicate witha discrete GPU (not shown). In any case, the CPU and GPU may beconfigured in accordance with a co-processing CPU/GPU computing model,wherein the sequential part of an application executes on the CPU andthe computationally-intensive part is accelerated by the GPU.

In some configurations, the processor 802 is, or is included in, asystem-on-chip (“SoC”) along with one or more of the other componentsdescribed herein below. For example, the SoC may include the processor802, a GPU, one or more of the network connectivity components 806, andone or more of the sensor components 808. In some configurations, theprocessor 802 is fabricated, in part, utilizing a package-on-package(“PoP”) integrated circuit packaging technique. The processor 802 may bea single core or multi-core processor.

The processor 802 may be created in accordance with an ARM architecture,available for license from ARM HOLDINGS of Cambridge, United Kingdom.Alternatively, the processor 802 may be created in accordance with anx86 architecture, such as is available from INTEL CORPORATION ofMountain View, Calif. and others. In some configurations, the processor802 is a SNAPDRAGON SoC, available from QUALCOMM of San Diego, Calif., aTEGRA SoC, available from NVIDIA of Santa Clara, Calif., a HUMMINGBIRDSoC, available from SAMSUNG of Seoul, South Korea, an Open MultimediaApplication Platform (“OMAP”) SoC, available from TEXAS INSTRUMENTS ofDallas, Tex., a customized version of any of the above SoCs, or aproprietary SoC.

The memory components 804 include a random access memory (“RAM”) 814, aread-only memory (“ROM”) 816, an integrated storage memory (“integratedstorage”) 818, and a removable storage memory (“removable storage”) 820.In some configurations, the RAM 814 or a portion thereof, the ROM 816 ora portion thereof, and/or some combination the RAM 814 and the ROM 816is integrated in the processor 802. In some configurations, the ROM 816is configured to store a firmware, an operating system or a portionthereof (e.g., operating system kernel), and/or a bootloader to load anoperating system kernel from the integrated storage 818 and/or theremovable storage 820.

The integrated storage 818 can include a solid-state memory, a harddisk, or a combination of solid-state memory and a hard disk. Theintegrated storage 818 may be soldered or otherwise connected to a logicboard upon which the processor 802 and other components described hereinalso may be connected. As such, the integrated storage 818 is integratedin the computing device. The integrated storage 818 is configured tostore an operating system or portions thereof, application programs,data, and other software components described herein.

The removable storage 820 can include a solid-state memory, a hard disk,or a combination of solid-state memory and a hard disk. In someconfigurations, the removable storage 820 is provided in lieu of theintegrated storage 818. In other configurations, the removable storage820 is provided as additional optional storage. In some configurations,the removable storage 820 is logically combined with the integratedstorage 818 such that the total available storage is made available as atotal combined storage capacity. In some configurations, the totalcombined capacity of the integrated storage 818 and the removablestorage 820 is shown to a user instead of separate storage capacitiesfor the integrated storage 818 and the removable storage 820.

The removable storage 820 is configured to be inserted into a removablestorage memory slot (not shown) or other mechanism by which theremovable storage 820 is inserted and secured to facilitate a connectionover which the removable storage 820 can communicate with othercomponents of the computing device, such as the processor 802. Theremovable storage 820 may be embodied in various memory card formatsincluding, but not limited to, PC card, CompactFlash card, memory stick,secure digital (“SD”), miniSD, microSD, universal integrated circuitcard (“UICC”) (e.g., a subscriber identity module (“SIM”) or universalSIM (“USIM”)), a proprietary format, or the like.

It can be understood that one or more of the memory components 804 canstore an operating system. According to various configurations, theoperating system includes, but is not limited to WINDOWS MOBILE OS fromMicrosoft Corporation of Redmond, Wash., WINDOWS PHONE OS from MicrosoftCorporation, WINDOWS from Microsoft Corporation, PALM WEBOS fromHewlett-Packard Company of Palo Alto, Calif., BLACKBERRY OS fromResearch In Motion Limited of Waterloo, Ontario, Canada, IOS from AppleInc. of Cupertino, Calif., and ANDROID OS from Google Inc. of MountainView, Calif. Other operating systems are contemplated.

The network connectivity components 806 include a wireless wide areanetwork component (“WWAN component”) 822, a wireless local area networkcomponent (“WLAN component”) 824, and a wireless personal area networkcomponent (“WPAN component”) 826. The network connectivity components806 facilitate communications to and from the network 856 or anothernetwork, which may be a WWAN, a WLAN, or a WPAN. Although only thenetwork 856 is illustrated, the network connectivity components 806 mayfacilitate simultaneous communication with multiple networks, includingthe network 604 of FIG. 6. For example, the network connectivitycomponents 806 may facilitate simultaneous communications with multiplenetworks via one or more of a WWAN, a WLAN, or a WPAN.

The network 856 may be or may include a WWAN, such as a mobiletelecommunications network utilizing one or more mobiletelecommunications technologies to provide voice and/or data services toa computing device utilizing the computing device architecture 800 viathe WWAN component 822. The mobile telecommunications technologies caninclude, but are not limited to, Global System for Mobile communications(“GSM”), Code Division Multiple Access (“CDMA”) ONE, CDMA7000, UniversalMobile Telecommunications System (“UMTS”), Long Term Evolution (“LTE”),and Worldwide Interoperability for Microwave Access (“WiMAX”). Moreover,the network 856 may utilize various channel access methods (which may ormay not be used by the aforementioned standards) including, but notlimited to, Time Division Multiple Access (“TDMA”), Frequency DivisionMultiple Access (“FDMA”), CDMA, wideband CDMA (“W-CDMA”), OrthogonalFrequency Division Multiplexing (“OFDM”), Space Division Multiple Access(“SDMA”), and the like. Data communications may be provided usingGeneral Packet Radio Service (“GPRS”), Enhanced Data rates for GlobalEvolution (“EDGE”), the High-Speed Packet Access (“HSPA”) protocolfamily including High-Speed Downlink Packet Access (“HSDPA”), EnhancedUplink (“EUL”) or otherwise termed High-Speed Uplink Packet Access(“HSUPA”), Evolved HSPA (“HSPA+”), LTE, and various other current andfuture wireless data access standards. The network 856 may be configuredto provide voice and/or data communications with any combination of theabove technologies. The network 856 may be configured to or adapted toprovide voice and/or data communications in accordance with futuregeneration technologies.

In some configurations, the WWAN component 822 is configured to providedual-multi-mode connectivity to the network 856. For example, the WWANcomponent 822 may be configured to provide connectivity to the network856, wherein the network 856 provides service via GSM and UMTStechnologies, or via some other combination of technologies.Alternatively, multiple WWAN components 822 may be utilized to performsuch functionality, and/or provide additional functionality to supportother non-compatible technologies (i.e., incapable of being supported bya single WWAN component). The WWAN component 822 may facilitate similarconnectivity to multiple networks (e.g., a UMTS network and an LTEnetwork).

The network 856 may be a WLAN operating in accordance with one or moreInstitute of Electrical and Electronic Engineers (“IEEE”) 802.11standards, such as IEEE 802.11a, 802.11b, 802.11g, 802.11n, and/orfuture 802.11 standard (referred to herein collectively as WI-FI). Draft802.11 standards are also contemplated. In some configurations, the WLANis implemented utilizing one or more wireless WI-FI access points. Insome configurations, one or more of the wireless WI-FI access points areanother computing device with connectivity to a WWAN that arefunctioning as a WI-FI hotspot. The WLAN component 824 is configured toconnect to the network 856 via the WI-FI access points. Such connectionsmay be secured via various encryption technologies including, but notlimited, WI-FI Protected Access (“WPA”), WPA2, Wired Equivalent Privacy(“WEP”), and the like.

The network 856 may be a WPAN operating in accordance with Infrared DataAssociation (“IrDA”), BLUETOOTH, wireless Universal Serial Bus (“USB”),Z-Wave, ZIGBEE, or some other short-range wireless technology. In someconfigurations, the WPAN component 826 is configured to facilitatecommunications with other devices, such as peripherals, computers, orother computing devices via the WPAN.

The sensor components 808 include a magnetometer 828, an ambient lightsensor 830, a proximity sensor 832, an accelerometer 834, a gyroscope836, and a Global Positioning System sensor (“GPS sensor”) 838. It iscontemplated that other sensors, such as, but not limited to,temperature sensors or shock detection sensors, also may be incorporatedin the computing device architecture 800.

The magnetometer 828 is configured to measure the strength and directionof a magnetic field. In some configurations the magnetometer 828provides measurements to a compass application program stored within oneof the memory components 804 in order to provide a user with accuratedirections in a frame of reference including the cardinal directions,north, south, east, and west. Similar measurements may be provided to anavigation application program that includes a compass component. Otheruses of measurements obtained by the magnetometer 828 are contemplated.

The ambient light sensor 830 is configured to measure ambient light. Insome configurations, the ambient light sensor 830 provides measurementsto an application program stored within one the memory components 804 inorder to automatically adjust the brightness of a display (describedbelow) to compensate for low-light and high-light environments. Otheruses of measurements obtained by the ambient light sensor 830 arecontemplated.

The proximity sensor 832 is configured to detect the presence of anobject or thing in proximity to the computing device without directcontact. In some configurations, the proximity sensor 832 detects thepresence of a user's body (e.g., the user's face) and provides thisinformation to an application program stored within one of the memorycomponents 804 that utilizes the proximity information to enable ordisable some functionality of the computing device. For example, atelephone application program may automatically disable a touchscreen(described below) in response to receiving the proximity information sothat the user's face does not inadvertently end a call or enable/disableother functionality within the telephone application program during thecall. Other uses of proximity as detected by the proximity sensor 832are contemplated.

The accelerometer 834 is configured to measure proper acceleration. Insome configurations, output from the accelerometer 834 is used by anapplication program as an input mechanism to control some functionalityof the application program. For example, the application program may bea video game in which a character, a portion thereof, or an object ismoved or otherwise manipulated in response to input received via theaccelerometer 834. In some configurations, output from the accelerometer834 is provided to an application program for use in switching betweenlandscape and portrait modes, calculating coordinate acceleration, ordetecting a fall. Other uses of the accelerometer 834 are contemplated.

The gyroscope 836 is configured to measure and maintain orientation. Insome configurations, output from the gyroscope 836 is used by anapplication program as an input mechanism to control some functionalityof the application program. For example, the gyroscope 836 can be usedfor accurate recognition of movement within a 3D environment of a videogame application or some other application. In some configurations, anapplication program utilizes output from the gyroscope 836 and theaccelerometer 834 to enhance control of some functionality of theapplication program. Other uses of the gyroscope 836 are contemplated.

The GPS sensor 838 is configured to receive signals from GPS satellitesfor use in calculating a location. The location calculated by the GPSsensor 838 may be used by any application program that requires orbenefits from location information. For example, the location calculatedby the GPS sensor 838 may be used with a navigation application programto provide directions from the location to a destination or directionsfrom the destination to the location. Moreover, the GPS sensor 838 maybe used to provide location information to an external location-basedservice, such as E911 service. The GPS sensor 838 may obtain locationinformation generated via WI-FI, WIMAX, and/or cellular triangulationtechniques utilizing one or more of the network connectivity components806 to aid the GPS sensor 838 in obtaining a location fix. The GPSsensor 838 may also be used in Assisted GPS (“A-GPS”) systems.

The I/O components 810 include a display 840, a touchscreen 842, a dataI/O interface component (“data I/O”) 844, an audio I/O interfacecomponent (“audio I/O”) 846, a video I/O interface component (“videoI/O”) 848, and a camera 850. In some configurations, the display 840 andthe touchscreen 842 are combined. In some configurations two or more ofthe data I/O component 844, the audio I/O component 846, and the videoI/O component 848 are combined. The I/O components 810 may includediscrete processors configured to support the various interfacedescribed below, or may include processing functionality built-in to theprocessor 802.

The display 840 is an output device configured to present information ina visual form. In particular, the display 840 may present graphical userinterface (“GUI”) elements, text, images, video, notifications, virtualbuttons, virtual keyboards, messaging data, Internet content, devicestatus, time, date, calendar data, preferences, map information,location information, and any other information that is capable of beingpresented in a visual form. In some configurations, the display 840 is aliquid crystal display (“LCD”) utilizing any active or passive matrixtechnology and any backlighting technology (if used). In someconfigurations, the display 840 is an organic light emitting diode(“OLED”) display. Other display types are contemplated.

The touchscreen 842, also referred to herein as a “touch-enabledscreen,” is an input device configured to detect the presence andlocation of a touch. The touchscreen 842 may be a resistive touchscreen,a capacitive touchscreen, a surface acoustic wave touchscreen, aninfrared touchscreen, an optical imaging touchscreen, a dispersivesignal touchscreen, an acoustic pulse recognition touchscreen, or mayutilize any other touchscreen technology. In some configurations, thetouchscreen 842 is incorporated on top of the display 840 as atransparent layer to enable a user to use one or more touches tointeract with objects or other information presented on the display 840.In other configurations, the touchscreen 842 is a touch pad incorporatedon a surface of the computing device that does not include the display840. For example, the computing device may have a touchscreenincorporated on top of the display 840 and a touch pad on a surfaceopposite the display 840.

In some configurations, the touchscreen 842 is a single-touchtouchscreen. In other configurations, the touchscreen 842 is amulti-touch touchscreen. In some configurations, the touchscreen 842 isconfigured to detect discrete touches, single touch gestures, and/ormulti-touch gestures. These are collectively referred to herein asgestures for convenience. Several gestures will now be described. Itshould be understood that these gestures are illustrative and are notintended to limit the scope of the appended claims. Moreover, thedescribed gestures, additional gestures, and/or alternative gestures maybe implemented in software for use with the touchscreen 842. As such, adeveloper may create gestures that are specific to a particularapplication program.

In some configurations, the touchscreen 842 supports a tap gesture inwhich a user taps the touchscreen 842 once on an item presented on thedisplay 840. The tap gesture may be used for various reasons including,but not limited to, opening or launching whatever the user taps. In someconfigurations, the touchscreen 842 supports a double tap gesture inwhich a user taps the touchscreen 842 twice on an item presented on thedisplay 840. The double tap gesture may be used for various reasonsincluding, but not limited to, zooming in or zooming out in stages. Insome configurations, the touchscreen 842 supports a tap and hold gesturein which a user taps the touchscreen 842 and maintains contact for atleast a pre-defined time. The tap and hold gesture may be used forvarious reasons including, but not limited to, opening acontext-specific menu.

In some configurations, the touchscreen 842 supports a pan gesture inwhich a user places a finger on the touchscreen 842 and maintainscontact with the touchscreen 842 while moving the finger on thetouchscreen 842. The pan gesture may be used for various reasonsincluding, but not limited to, moving through screens, images, or menusat a controlled rate. Multiple finger pan gestures are alsocontemplated. In some configurations, the touchscreen 842 supports aflick gesture in which a user swipes a finger in the direction the userwants the screen to move. The flick gesture may be used for variousreasons including, but not limited to, scrolling horizontally orvertically through menus or pages. In some configurations, thetouchscreen 842 supports a pinch and stretch gesture in which a usermakes a pinching motion with two fingers (e.g., thumb and forefinger) onthe touchscreen 842 or moves the two fingers apart. The pinch andstretch gesture may be used for various reasons including, but notlimited to, zooming gradually in or out of a website, map, or picture.

Although the above gestures have been described with reference to theuse one or more fingers for performing the gestures, other appendagessuch as toes or objects such as styluses may be used to interact withthe touchscreen 842. As such, the above gestures should be understood asbeing illustrative and should not be construed as being limiting in anyway.

The data I/O interface component 844 is configured to facilitate inputof data to the computing device and output of data from the computingdevice. In some configurations, the data I/O interface component 844includes a connector configured to provide wired connectivity betweenthe computing device and a computer system, for example, forsynchronization operation purposes. The connector may be a proprietaryconnector or a standardized connector such as USB, micro-USB, mini-USB,or the like. In some configurations, the connector is a dock connectorfor docking the computing device with another device such as a dockingstation, audio device (e.g., a digital music player), or video device.

The audio I/O interface component 846 is configured to provide audioinput and/or output capabilities to the computing device. In someconfigurations, the audio I/O interface component 846 includes amicrophone configured to collect audio signals. In some configurations,the audio I/O interface component 846 includes a headphone jackconfigured to provide connectivity for headphones or other externalspeakers. In some configurations, the audio I/O interface component 846includes a speaker for the output of audio signals. In someconfigurations, the audio I/O interface component 846 includes anoptical audio cable out.

The video I/O interface component 848 is configured to provide videoinput and/or output capabilities to the computing device. In someconfigurations, the video I/O interface component 848 includes a videoconnector configured to receive video as input from another device(e.g., a video media player such as a DVD or BLURAY player) or sendvideo as output to another device (e.g., a monitor, a television, orsome other external display). In some configurations, the video I/Ointerface component 848 includes a High-Definition Multimedia Interface(“HDMI”), mini-HDMI, micro-HDMI, DisplayPort, or proprietary connectorto input/output video content. In some configurations, the video I/Ointerface component 848 or portions thereof is combined with the audioI/O interface component 846 or portions thereof.

The camera 850 can be configured to capture still images and/or video.The camera 850 may utilize a charge coupled device (“CCD”) or acomplementary metal oxide semiconductor (“CMOS”) image sensor to captureimages. In some configurations, the camera 850 includes a flash to aidin taking pictures in low-light environments. Settings for the camera850 may be implemented as hardware or software buttons.

Although not illustrated, one or more hardware buttons may also beincluded in the computing device architecture 800. The hardware buttonsmay be used for controlling some operational aspect of the computingdevice. The hardware buttons may be dedicated buttons or multi-usebuttons. The hardware buttons may be mechanical or sensor-based.

The illustrated power components 812 include one or more batteries 852,which can be connected to a battery gauge 854. The batteries 852 may berechargeable or disposable. Rechargeable battery types include, but arenot limited to, lithium polymer, lithium ion, nickel cadmium, and nickelmetal hydride. Each of the batteries 852 may be made of one or morecells.

The battery gauge 854 can be configured to measure battery parameterssuch as current, voltage, and temperature. In some configurations, thebattery gauge 854 is configured to measure the effect of a battery'sdischarge rate, temperature, age and other factors to predict remaininglife within a certain percentage of error. In some configurations, thebattery gauge 854 provides measurements to an application program thatis configured to utilize the measurements to present useful powermanagement data to a user. Power management data may include one or moreof a percentage of battery used, a percentage of battery remaining, abattery condition, a remaining time, a remaining capacity (e.g., in watthours), a current draw, and a voltage.

The power components 812 may also include a power connector, which maybe combined with one or more of the aforementioned I/O components 810.The power components 812 may interface with an external power system orcharging equipment via an I/O component.

What is claimed is:
 1. A method, comprising: receiving an inputcontaining a plurality of words; determining a conditional count;determining an unconditional count; determining an adjustment factor fora pair of words of the plurality of words based on the unconditionalcount and the conditional count; generating a data structure defining aplurality of word clusters, individual word clusters of the plurality ofword clusters include at least one word of the plurality of words; andreconstructing the adjustment factor of the pair of words based on anumber of common clusters between individual words of the pair of words.2. The method of claim 1, further comprising: obtaining an inputindicating word; and reconstructing the freshness value associated withone or more word clusters containing the word, the modification to thefreshness value indicating that the one or more word clusters containingthe word are more recent than other word clusters of the plurality ofword clusters.
 3. The method of claim 1, further comprising: receiving atext entry; determining one or more word clusters of the plurality ofword clusters associated with the text entry; obtaining a freshnessfactor associated with the one or more word clusters of the plurality ofword clusters associated with the text entry; obtaining a relatedadjustment factor associated with the one or more word clusters;obtaining a language model value; determining a candidate probabilityassociated with a word candidate based, at least in part, on thelanguage model value and the related adjustment factor, wherein the wordcandidate is selected from individual words associated with theplurality of word clusters; and generating an output containing the wordcandidate based, at least in part, on the candidate probability.
 4. Themethod of claim 3, further comprising: determining a plurality of wordcandidates, wherein individual words of the plurality of word candidatescomprise an individual candidate probability based, at least in part, onthe language model value and the related adjustment factor; generatingdata indicating a ranking of the word candidate and the individual wordsof the plurality of word candidates based, at least in part, on theindividual candidate probabilities and the candidate probability; andgenerating an output indicating the ranking.
 5. The method of claim 4,wherein the language model value comprises a probability in which a wordassociated with the text entry is universally used in a null context. 6.The method of claim 1, wherein reconstructing the adjustment factor ofthe pair of words is also based on a ranking of at least one correlationbetween the words.
 7. The method of claim 1, wherein reconstructing theadjustment factor of the pair of words comprises: determining a clusterdensity for individual clusters of the plurality of clusters;determining an ordering of the plurality of clusters based on thecluster density for the individual clusters; and reconstructing theadjustment factor based on the ordering of the plurality of clusters andthe number of common clusters between individual words of the pair ofwords.
 8. A computing device, comprising: a processor; and a memoryhaving a set of computer-executable instructions stored thereupon which,when executed by the processor, cause the computing device to receive aninput containing a plurality of words; determine a conditional count;determine an unconditional count; determine an adjustment factor for apair of words of the plurality of words based on the unconditional countand the conditional count; generate a data structure defining aplurality of word clusters, individual word clusters of the plurality ofword clusters include at least one word of the plurality of words; andreconstruct the adjustment factor of the pair of words based on a numberof common clusters between individual words of the pair of words.
 9. Thecomputing device of claim 8, wherein the computer-executableinstructions cause the computing device to: obtain an input indicatingword; and reconstruct the freshness value associated with one or moreword clusters containing the word, the modification to the freshnessvalue indicating that the one or more word clusters containing the wordare more recent than other word clusters of the plurality of wordclusters.
 10. The computing device of claim 8, wherein thecomputer-executable instructions cause the computing device to: receivea text entry; determine one or more word clusters of the plurality ofword clusters associated with the text entry; obtain a freshness factorassociated with the one or more word clusters of the plurality of wordclusters associated with the text entry; obtain a related adjustmentfactor associated with the one or more word clusters; obtain a languagemodel value; determine a candidate probability associated with a wordcandidate based, at least in part, on the language model value and therelated adjustment factor, wherein the word candidate is selected fromindividual words associated with the plurality of word clusters; andgenerate an output containing the word candidate based, at least inpart, on the candidate probability.
 11. The computing device of claim10, wherein the computer-executable instructions cause the computingdevice to: determine a plurality of word candidates, wherein individualwords of the plurality of word candidates comprise an individualcandidate probability based, at least in part, on the language modelvalue and the related adjustment factor; generate data indicating a rankof the word candidate and the individual words of the plurality of wordcandidates based, at least in part, on the individual candidateprobabilities and the candidate probability; and display data indicatingthe rank of the word candidate and the individual words.
 12. Thecomputing device of claim 11, wherein the language model value comprisesa probability in which a word associated with the text entry isuniversally used in a null context.
 13. The computing device of claim 8,wherein reconstructing the adjustment factor of the pair of words isalso based on a ranking of at least one correlation between the words.14. The computing device of claim 8, wherein reconstructing theadjustment factor of the pair of words comprises: determining a clusterdensity for individual clusters of the plurality of clusters;determining an ordering of the plurality of clusters based on thecluster density for the individual clusters; and reconstructing theadjustment factor based on the ordering of the plurality of clusters andthe number of common clusters between individual words of the pair ofwords.
 15. A computer-readable storage medium having computer-executableinstructions stored thereupon which, when executed by a computingdevice, cause the computing device to: receive an input containing aplurality of words; determine a conditional count; determine anunconditional count; determine an adjustment factor for a pair of wordsof the plurality of words based on the unconditional count and theconditional count; generate a data structure defining a plurality ofword clusters, individual word clusters of the plurality of wordclusters include at least one word of the plurality of words; andreconstruct the adjustment factor of the pair of words based on a numberof common clusters between individual words of the pair of words. 16.The computer-readable storage medium of claim 15, wherein thecomputer-executable instructions cause the computing device to: obtainan input indicating word; and reconstruct the freshness value associatedwith one or more word clusters containing the word, the modification tothe freshness value indicating that the one or more word clusterscontaining the word are more recent than other word clusters of theplurality of word clusters.
 17. The computer-readable storage medium ofclaim 15, wherein the computer-executable instructions cause thecomputing device to: receive a text entry; determine one or more wordclusters of the plurality of word clusters associated with the textentry; obtain a freshness factor associated with the one or more wordclusters of the plurality of word clusters associated with the textentry; obtain a related adjustment factor associated with the one ormore word clusters; obtain a language model value; determine a candidateprobability associated with a word candidate based, at least in part, onthe language model value and the related adjustment factor, wherein theword candidate is selected from individual words associated with theplurality of word clusters; and generate an output containing the wordcandidate based, at least in part, on the candidate probability.
 18. Thecomputer-readable storage medium of claim 17, wherein thecomputer-executable instructions cause the computing device to:determine a plurality of word candidates, wherein individual words ofthe plurality of word candidates comprise an individual candidateprobability based, at least in part, on the language model value and therelated adjustment factor; generate data indicating a rank of the wordcandidate and the individual words of the plurality of word candidatesbased, at least in part, on the individual candidate probabilities andthe candidate probability; and display data indicating the rank of theword candidate and the individual words.
 19. The computer-readablestorage medium of claim 15, wherein reconstructing the adjustment factorof the pair of words is also based on a ranking of at least onecorrelation between the words.
 20. The computer-readable storage mediumof claim 15, wherein reconstructing the adjustment factor of the pair ofwords comprises: determining a cluster density for individual clustersof the plurality of clusters; determining an ordering of the pluralityof clusters based on the cluster density for the individual clusters;and reconstructing the adjustment factor based on the ordering of theplurality of clusters and the number of common clusters betweenindividual words of the pair of words.